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Protein-network modeling of prostate cancer gene 
signatures reveals essential pathways in 
disease recurrence 

James L Chen, 1 Jianrong Li, 2 Walter M Stadler, 1 Yves A Lussier 2 ' 3 ' 4 



ABSTRACT 

Objective Uncovering the dominant molecular 
deregulation among the multitude of pathways 
implicated in aggressive prostate cancer is essential to 
intelligently developing targeted therapies. Paradoxically, 
published prostate cancer gene expression signatures of 
poor prognosis share little overlap and thus do not reveal 
shared mechanisms. The authors hypothesize that, by 
analyzing gene signatures with quantitative models of 
protein— protein interactions, key pathways will be 
elucidated and shown to be shared. 
Design The authors statistically prioritized common 
interactors between established cancer genes and genes 
from each prostate cancer signature of poor prognosis 
independently via a previously validated single protein 
analysis of network (SPAN) methodology. Additionally, 
they computationally identified pathways among the 
aggregated interactors across signatures and validated 
them using a similarity metric and patient survival. 
Measurement Using an information-theoretic metric, 
the authors assessed the mechanistic similarity of the 
interactor signature. Its prognostic ability was assessed 
in an independent cohort of 198 patients with 
high-Gleason prostate cancer using Kaplan— Meier 
analysis. 

Results Of the 13 prostate cancer signatures that were 
evaluated, eight interacted significantly with established 
cancer genes (false discovery rate <5%) and generated 
a 42-gene interactor signature that showed the highest 
mechanistic similarity (p<0.0001). Via parameter-free 
unsupervised classification, the interactor signature 
dichotomized the independent prostate cancer cohort 
with a significant survival difference (p=0.009). 
Interpretation of the network not only recapitulated 
phosphatidylinositol-3 kinase/NF-KB signaling, but also 
highlighted less well established relevant pathways such 
as the Janus kinase 2 cascade. 
Conclusions SPAN methodolgy provides a robust 
means of abstracting disparate prostate cancer gene 
expression signatures into clinically useful, prioritized 
pathways as well as useful mechanistic pathways. 



INTRODUCTION 

Gene signatures provide a glimpse into critical 
molecular pathways, as they essentially serve as 
a bridge between clinical phenotypes and genomics. 
As defined by Richard Simon, 'a multigene expres- 
sion signature classifier is a function that provides 
a classification of a tumor based on the expression 
levels of the component genes. The classes are often 
good-risk or poor-risk, but classifiers can be defined 



to distinguish any set of classes for which a training 
set of cases exist for each class. 1 ' These signatures 
have traditionally been derived by examining the 
differential expression of mRNA from discrete 
cancer states such as tumor versus normal tissue or 
high-grade versus low-grade tumors. Beginning over 
a decade ago with the identification of poor-risk 
breast cancer gene sets, 2 3 these gene signatures 
have rapidly proliferated to the point where nearly 
1000 entries exist in a gene signature database 
established to catalog them. 4 Surprisingly, despite 
their proliferation, few of these signatures have 
been commercialized and adopted by the medical 
community. In the USA, only one product in breast 
cancer, OncotypeDX, has achieved widespread 
adoption 5 ; however, newer tests such as a 'tumor of 
origin' assay for cancers of unknown primary may 
gain in popularity. In contrast, biomarkers such as 
prostate-specific antigen (PSA) in prostate cancer, 
HER2/Neu in breast cancer, and epidermal growth 
factor receptor (EGFR) in colon cancer have enjoyed 
rapid usage among practitioners with a multitude 
of clinical trials. 

Indeed, the vast majority of biomarkers are 
functionally and biologically understood, in stark 
contrast with gene signatures. Moreover, 
biomarkers tend to be single-pathway-specific, 
whereas gene signatures may span multiple mech- 
anisms. To add to the confusion, genes constituting 
distinct signatures are rarely shared among gene 
signatures even though they paradoxically occupy 
a common prognostic space. 7 Their similar 
efficiency in predicting poor clinical outcomes in 
new cohorts has led some observers such as Joan 
Massague in his 2007 New England Journal of 
Medicine editorial to call for research into 'sorting 
out' these gene signatures and elucidating their 
common overlap. 8 Thus a critical problem for those 
in oncology has been determining whether these 
disjointed genetic signatures can 'jointly' provide 
a unified mechanistic rationale bridging both gene 
expression and clinical outcomes. 

To address this challenge, we have previously 
demonstrated that, by aggregating different, 
published genetic signatures of poor prognosis, we 
can reveal shared molecular pathways — for 
example, excess direct interactions with oncogenes 
and tumor suppressors — through the application of 
a network modeling technique termed single 
protein analysis of networks (SPAN). 9 SPAN, 
previously validated, 10 takes advantage of 
protein— protein interaction networks that have 
been used to generate robust clinical predictions in 
other tumor types. 9 11 In essence, SPAN uses as 
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input a set of uncategorized protein interactions; as output, 
SPAN returns proteins that are more connected than can be 
expected by chance. The advantage of SPAN over purely 
expression- or literature-based methods of prioritization is that 
it will detect important proteins even if they are not overtly 
modified or amplified. 9 Thus SPAN provides critical information 
that may not be accessible through expression data alone. 

In this paper, we turn our attention to prostate cancer, as it 
faces a similar data prioritization problem. The treatment of 
prostate cancer has historically been centered around deregula- 
tion of the androgen receptor (AR) to effectively eliminate the 
effects of testosterone, the ligand for the AR. However, despite 
AR-specific targeted therapy, most patients eventually develop 
resistance to these agents. Consequently, multiple alternative 
pathways of 'poor prognosis' have been studied for therapeutic 
targeting, as many molecular mechanisms have been implicated 
in AR cross-talk, such as the Janus kinase (JAK)/STAT 12 and 
platelet-derived growth factor (PDGF) receptor pathways. 13 
There has been no integrative approach to elucidating the key 
regulatory pathways. Importantly, we believe that, not only can 
we uncover key molecular pathways, but we can also generate 
gene signatures that are mechanistically coherent — or, in other 
words, enriched for the same molecular pathways. 

While past computational approaches in prostate cancer have 
focused on ranking single gene targets among multiple diseases, 14 
we hypothesized that, using protein interactions, we could take 
advantage of the richness that gene signatures have to offer in the 
selection of molecular pathways that play essential roles in 
prostate cancer progression. To this end, we extracted a broad 
representation of poor-prognosis gene expression prostate cancer 
signatures from the literature (seed signatures). We then evalu- 
ated their individual protein interactions with known cancer 
genes curated by the Wellcome Trust Sanger Institute via SPAN. 
We further assembled the significant interactions of each 
signature. The result is what we term an 'interactor signature' — a 
prioritized list of genes relating independent prostate cancer gene 
signatures. We evaluated this interactor signature in two ways: 
its internal mechanistic coherence using a novel application of 
information theory similarity and then its intrinsic ability to 
predict survival in a cohort independent of the seed signatures' 
cohorts. Finally, we added a qualitative evaluation of the 
signature against known prostate cancer pathways and current 
therapies. Taken together, we show that, through an extensive 
network analysis, prostate cancer gene expression signatures can 
be transformed into a set of prioritized pathways that ultimately 
provide a useful guide for therapeutic development. 

METHODS 
Datasets used 

Prostate cancer signatures 

We evaluated 12 previously published prostate gene signatures of 
poor prognosis 15-27 and a previously unpublished prostate 
cancer gene signature derived from a Mayo Clinic dataset 22 listed 
in table 1. Thus a total of 13 gene signatures were evaluated. 
Signatures were deliberately chosen to span various phenotypes 
(eg, high-grade tumor, stem cell nature) but unified in their 
ability to prognosticate either decreased overall survival or early 
disease relapse in prostate cancer datasets. These distinct specific 
phenotypic conditions are well-established biological or clinical 
indicators of aggressive malignancy. A full listing of the genes 
from the included prostate cancer signatures and their 
translation are available at http://lussierlab.org/publications/ 
ProstateSignature. 



Cancer mechanism genes 

The Sanger Cancer Gene Census is a database maintained by the 
Wellcome Trust Cancer Genome Project, which contains 
a catalog of genes for which mutations have been causally 
implicated in cancer, acquired and updated through literature- 
based methods. We downloaded the Cancer Gene Census on 
October 9, 2009 from http://www.sanger.ac.uk/genetics/CGP. 

Protein— protein interaction network 

In brief, the protein interactions were downloaded from the 
Search Tool for the Retrieval of Interacting Genes version 8.0 on 
December 19, 2008 (STRING; http://string.embl.de). 29 STRING 
is a repository maintained by a European consortium of 
genomics facilities which contains known and predicted 
protein— protein interactions derived from such sources as high- 
throughput experiments, co-expression data, and literature. We 
extracted all human protein— protein interactions and retained 
those with a combined score of >900 (highly reliable score) that 
also had gene fusion, experimental, or database evidence. Thus 
text mining results of STRING were filtered out. A total of 
72 617 distinct interactions between 7681 distinct proteins were 
retained. 11 Proteins were considered to be nodes, and 
interactions between proteins are links. 

Signature generation from the Mayo Clinic prostate cancer data 

The original Mayo Clinic signature was significantly smaller 
than other comparable signatures, and we thus recalculated 
a broader signature as follows. Gene Expression Omnibus 
(GSE10645) was downloaded and analyzed using R/Biocon- 
ductor. We compared men who had biochemical relapse with 
systemic disease (bone or visceral disease) with those who did 
not. Genes with little to no change in expression levels were 
filtered using a covariance of expression parameter of 0.3. 
Significance analysis of microarrays 30 was then performed to 
obtain a gene signature with a false discovery rate (FDR) <5%. 

SPAN analysis of the prostate cancer gene signatures 

The SPAN method has been extensively described previously 9 11 
and expanded details can be found in the online supplemental 
methods. In brief, each prostate cancer gene signature was 
compared with the Sanger Cancer Gene set using SPAN. The 
observed number of interactions between the prostate cancer 
signatures and the Sanger cancer genes were derived and 
compared with an expected distribution through permutation 
resampling. The unadjusted p value of each signature gene's 
connectivity was further adjusted for multiplicity using 
Bonferroni-type methods. A converse calculation was performed 
where each single Sanger cancer gene was analyzed for its total 
number of interactions with each independent, unique gene in 
the amalgamated prostate cancer signatures independently and 
assigned a p value and a Benjamini— Hochberg FDR. Prioritized 
genes and their interactors that had a FDR <5% were retained. 
The resulting statistically significant genes were then aggregated 
to form an 'interactor signature'. As each SPAN protein keeps an 
equal number of partners in the empirical distribution (constant 
node degree), 'hub proteins' are statistically prioritized using 
conservative controls. See figure 1 and the online supplemental 
methods for details of interactor signature assembly from seed 
signatures. The resulting network was then displayed in Cyto- 
scape 31 where Sanger cancer genes that are also members of the 
expression signature gene lists are clearly represented. Further, 
when SPAN-prioritized Sanger cancer genes from individual 
signature genes also overlap between these signatures, these 
shared known cancer interactions common among signatures 
serve as a 'quasi-gold standard' because of the very high 
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Table 1 Prostate cancer gene signatures evaluated 



Signature No Phenotype 



Samples used to generate signature 



Author 



Available genes In Network* 



1 



Aggressive disease 



Benign versus cancerous prostate tissue 



High-grade tumor 



PTEN pathway/poor prognosis 



Recurrence signature in solid tumors 



Recurrent/aggressive disease 



Recurrent disease 



Recurrent disease 



Recurrent disease /High-Gleason score 



10 



Relapse-free survival 



Stem cell nature 



12 



13 



Systemic disease after relapse, Sig 1 



Systemic disease after relapse, Sig 2 



Divided 66 microdissected prostate Yu ef a/ 28 

cancer specimens into two groups based 

on clinical aggressiveness defined as 

prostate-specific antigen (PSA) relapse 

following radical retropubic 

prostatectomy (RRP), distant metastasis, 

or cancer invasion into adjacent organs 

Proteomic screen of microdissected Bismar ef a/ ,B 

prostate tissue embedded in tissue 

microarray for genes that best 

discriminated between benign, localized 

prostate cancer and metastatic disease 

Examined 12 microdissected RRP Gleason True ef a/ 27 

pattern 3 specimens compared with that 

of 12 Gleason pattern 4 and eight Gleason 

pattern 5 

Comparison of 35 phosphatase and tensin Saal ef a/ 24 
homolog (PTEN) negative and 70 PTEN 
positive based on immunohistochemistry 
from stage II estrogen receptor status- 
matched breast cancer specimens. 
Signature subsequently validated in 
a historic dataset of 79 prostate cancer 
specimens 19 described above in the Sun 
ef a/ signature 

Comparison of gene expression signature Ramaswamy ef a/ 2 ' 
from 12 metastatic adenocarcinoma 
nodules from prostate and five other 
tissue types compared to 64 primary 
adenocarcinomas from primary tumors 
Evaluated 62 primary prostate tumors, 41 Lapointe ef a/ 21 
normal prostate specimens and nine 
lymph node metastases to develop a two- 
gene model of recurrence 

79 patient RRP specimens from patients Sun and Goodison 21 
with clinically localized prostate cancer. 
39 cases with recurrence defined as three 
consecutive elevations in PSA for at least 
5 years 

Using 21 prostate cancer samples, five Singh ef a/ 25 

genes using k-nn clustering were 

identified. 

512 candidate genes were analyzed for Bibikova ef a/ 16 

correlation with Gleason score from 71 

patient RRP specimens (16 patients with 

relapsed disease defined as two 

consecutive PSA elevations over 

84 months) 

Using 21 prostate cancer samples from Glinsky ef a/ 19 
Singh ef a/, the authors identified three 
signatures of recurrence and 
subsequently validated these signatures 
on a set of 79 tumors 

Comparison of CD133 + /a 2 Pi hi cell culture Birnie ef a/" 
specimens from 12 human prostate 
cancers compared with eight CD133~/ 
a 2 Pi'° w specimens. 

213 patients with prostate cancer PSA Nakagawa ef a/ 22 

relapse and no evidence of systemic 

disease (defined as a positive bone scan 

or CT scan) were compared with 213 

patients with prostate cancer with PSA 

relapse 

Reanalysis of the above Nagakawa ef a/ 
(Mayo Clinic dataset) as described in the 
Methods section 



26 



Yes 



12 



85 



184 



Yes 



Yes 



Yes 



17 



16 



4/4/5 



22 



17 



133 



Yes 



No 



No 



No 



Yes 



No 



Yes 



No 



Yes 



*ln Network indicates that the listed gene signature connects to the Sanger cancer genes via SPAN and composes part of the interactor signature. 



statistical and biological significance of such an occurrence. 
Visualization of statistically significant pathways using the 
Kyoto Encyclopedia of Genes and Genomes (KEGG) provided 
a further unbiased evaluation of reference gold-standard 
pathway genes and their associated networks. 32 



Gene Ontology enrichment of the interactor signature 

Using FuncAssociate 2.0 software, 33 we evaluated the resulting 
interactor signature genes for common molecular processes and 
biological functions from annotations found in Gene Ontology 
(GO). GO annotations that were statistically over-represented 
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Figure 1 Representative assembly of 
the protein network from disparate gene 
signatures. As shown in (A), signature 
1 gene/proteins (blue circles) do not 
connect directly with signature 2 
proteins (green circles). Protein 
interaction networks can independently 
link (solid line) each gene signature to 
a common set of cancer genes from the 
Sanger database (red triangles in B). 
Subsequently, using single protein 
analysis of network (SPAN), only 
protein— protein interactions with 
a false discovery rate (FDR) <0.05 are 
retained in each signature and an 
aggregate of interactors is assembled 
from all SPAN analyses of each 
signature, thus generating a composite 
network with a FDR <1% ((C) large 
shapes=FDR <5%, small shapes=FDR 
>5%). Prognostic gene expression 
signatures are represented as squares, 
and their respective genes are related 
with dotted lines. 



A: Gene signatures 



no direct interactions between 
signatures in protein interaction networi< 



B: Independent SPAN analyses linking gene signatures & Sanger Cancer Genes 
SPAN 1 SPAN 2 





C: Proteins with FDR<0.05 by SPAN retained and networks merged 



FDFK0.05 




FDR<0.05 



joint FDR of entire network<0. 001 



with a FDR <1% were noted. A local minimum algorithm was 
then used to identify more informative GO terms. 11 

Evaluating pathway similarity among sets of gene signatures 

In order to determine if sets of genes comprised related or 
unrelated molecular pathways, we calculated a metric of infor- 
mation theoretic similarity (ITS) applied to GO that we and 
others have previously validated. 10 Among the approaches used 
to estimate similarity between gene functions, those derived 
from information theory and GO are considered robust and state 
of the art, 34 and we have previously demonstrated their utility 
in calculating the similarity between breast cancer expression 
signatures. 9 We performed two evaluations using this metric. 
First, we examined the similarity of annotations within each 
seed signature geneset (lists of its genes) and the interactor 



signature in the context of GO to derive an ITS score by 
examining each gene— gene distance using the information 
theoretic distance. The information theoretic distance of each 
gene— gene distance was based on their respective annotations in 
GO. We then took the summation of all the scores of the unique 
gene pairs. We divided this total score by the number of genes in 
the signature. These scores were not further normalized, as we 
used an information theoretic distance, not the gene expression 
level, between the genes. Information theoretic distances are 
calculated as a continuous variable between 0 and 1. Therefore 
all measurements are within the same scale, and in our esti- 
mation did not require further normalization. Scores were 
calculated for the interactor signature and for the original gene 
signatures. To control the ITS for length of signature, we then 
generated an empiric distribution by using a bootstrap, 
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resampling without replacement, of genes from the protein 
network for each signature; we selected the same number of 
signature genes, calculated the ITS, and repeated this procedure 
10000 times. We then observed the rank of the original gene 
signature ITS score within the individual empirical distributions 
and calculated a p value (reported in table 1). 

Generation of a prioritized phenotype— pathway map 

We examined the connectivity of the interactor signature to 
itself using SPAN. Significant proteins with a FDR <0.05 were 
retained. We then overlaid KEGG pathway data on to the 
resultant protein network using the DAVID tool 35 (adjusted 
p value <0.05). The final annotated protein network is what is 
considered to be the phenotype— pathway map. This pathway 
map has been thoroughly reviewed in the literature for its 
biomolecular mechanistic relevance to prostate cancer progression 
and prognosis. 

Survival analysis 

To test the clinical relevance of the interactor signature, we 
examined its ability to find a survival difference in a large and 
independent retrospective dataset of 281 Swedish men who 
underwent a course of 'watchful waiting' after being diagnosed 
with prostate cancer (GSE10645). 22 This survival analysis using 
a separate dataset serves as a type of clinical evaluation of the 
interactor signature. This set of 281 only included patients who 
were alive or had died from prostate-cancer-specific causes. For 
each patient, gene expression levels from the interactor signature 
were totaled to develop a per-patient score. Patients were placed 
in one of two groups on the basis of whether or not they were 
above or below the mean score. Kaplan— Meier analysis was then 
performed using time from diagnosis until death. In a second 
analysis, only patients with undisputed disease (Gleason scores 
of 7, 8, or 9) were included for analysis. 

Qualitative validation of interactor genes, connections, and network 

To establish whether the genes, interactions, and network 
prioritized via our analyses were relevant in prostate cancer, 
multiple reference sources were queried, (i) PubMed literature 
searches restricted from 2000 to 2010 were entered with the 
target genes and their interactions of interest and the keyword 
'prostate'. Relevant and high priority pathways were then 
identified and reported, (ii) Genes were analyzed by Ingenuity 
Pathway Analysis (Ingenuity Systems; http://www.ingenuity. 
com) to observe literature-based connections among the genes 
and canonical sub-networks, (iii) ClinicalTrials.gov website from 
December 1, 2010 was queried for each SPAN prioritized gene in 
the second interactor signature to observe whether relevant 
clinical trials were ongoing or planned. 

RESULTS AND DISCUSSION 

Prostate cancer gene signatures are tightly interwoven and have 
greater interacting partners than expected by chance 

We began with 13 prostate cancer signatures that all had 
statistically significant worsened survival outcomes in indepen- 
dent cohorts of patients with prostate cancer. Eight gene 
signatures among the 13 met the compound connectivity 
criteria of (i) FDR <5% and (ii) having two or more interactors 
between the gene expression signature and the Sanger cancer 
genes (Methods). Of note, traditional measures of prostate 
cancer aggressiveness are based on the tumor morphology or 
grade, and thus four of the signatures examined this specifically: 
(i, ii) benign versus cancerous prostate tissue, 18 (iii) high-Gleason 



score and (iv) high-grade tumor. Recurrent disease is by 
definition already more aggressive, and multiple gene expression 
profiles were derived from tumors with this phenotype: (v, vi, 
vii) recurrent disease, 19 25 26 (viii) recurrence signature in solid 
tumors, 23 (ix, x) systemic disease after relapse 22 and recalculated 
systemic disease after relapse (Mayo Clinic dataset), and (xi) 
aggressive disease — which included patients who relapsed after 
primary therapy. 28 The last two signatures are based on princi- 
ples of the cancer biology of aggressiveness — namely more 
primitive appearing cancers — (xii) stem cell in nature 17 or 
cancers that have a known phosphatase and tensin homolog 
(PTEN) deregulation (xiii) PTEN pathway. 24 Please refer to 
table 1 for full details. 

Using SPAN methodology, we evaluated whether any genes 
from the prostate cancer gene signatures that were significantly 
connected to the Sanger cancer genes curated by the Wellcome 
Trust Cancer Genome. We also examined whether there were 
Sanger genes that were significantly connected to each gene 
signature. In total, 42 genes were statistically significant with 
a FDR of 5% and met criteria for having at least two interacting 
partners (table 1, online supplemental table 1). We call these 42 
genes the 'interactor signature'. Eight of the 13 gene signatures 
were connected via SPAN. 

We also examined the interactor signature genes' connectivity to 
other genes. As a check of our prioritization method, we believed 
that our interactor genes would have importance within a network 
context. To confirm this, we relied on work published by the 
Gerstein laboratory 36 who had identified specific network proteins 
as having biologically significant properties. They defined 'hubs' as 
proteins that have the 20% highest number of neighbors, and 
'bottlenecks' as the proteins that are in the top 20% in terms of 
betweenness (connecting groups of proteins). In our network, 29 
(69%) proteins were bottlenecks, 25 (59%) were hubs, and 24 (57%) 
were both bottleneck and hubs (online supplemental table 1). 
SPAN analyses are conservatively controlled for hubness. Each 
protein keeps its node degree (number of protein interactions) 
constant in each permutation, while its interactors are resampled. 
The fact that 57% were both hub and bottleneck proteins is in far 
excess of the baseline 10.14% in a random distribution of proteins 
from the network (p<0.0001, Fisher exact test) 3 This confirmed 
to us that, at a network structure level, our interactor signature 
identified critical players in poor-prognosis prostate cancer. The 
tightly interwoven nature of our interactor signature is readily 
evident in our graphical representation of its relationships (figure 2). 

Interactor signature genes are involved in cell cycle, PDGF and 
fibroblast growth factor (FGF) signaling, and phosphorylation 

We sought to characterize the predominant biomolecular func- 
tions of the selected 42 genes. To do this, we evaluated the 
functional annotations found in GO of this interactor signature. 
GO is essentially a hierarchical lexicon of terms used to describe 
genes. We determined whether these descriptors of biomolecular 
functions were enriched in our gene set. Highly significant 
(adjusted p value <0.0001) descriptors that were associated with 
this set of genes were terms related to several pathways, namely 
PDGF and FGF signaling. Also notable were annotations related 
to cell cycle regulation and phosphorylation. Full results of this 
GO enrichment are listed in online supplemental table 2. 

The 42-gene interactor signature prioritizes key pathways better 
than other prostate gene signatures 

To evaluate whether the genes in our interactor signature were 
more related to one another (ie, involved in the same molecular 
pathway or performed the same molecular function) than genes 
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Figure 2 Combined network of 
prioritized signature genes and cancer 
proteins derived from single protein 
analysis of network (SPAN) protein 
interaction analysis conducted over 
each expression signature. Prostate 
cancer gene signatures of poor 
prognosis (large grey squares) were 
evaluated for their protein— protein 
interaction connectivity to the Sanger 
cancer genes curated by the Wellcome 
Trust Cancer Gene Atlas through SPAN 
methodology. Squares represent 
prostate cancer gene signatures, circles 
indicate network genes, and triangles 
indicate Wellcome Trust Sanger cancer 
genes. Red indicates statistically 
significant proteins (false discovery rate 
(FDR) <5%) with at least two 
interacting partners, and grey indicates 
non-prioritized proteins. Nodes on the 
outer circle indicate prostate cancer 
signature genes, and nodes in the 
innermost circle indicate proteins 
contributing to prioritize the statistically 
significant ones but for which the FDR 
>5%. Dashed lines indicate linkages 
between signature genes and their 
respective signatures, and solid lines 
indicate a protein interaction. 



Bibikova 



FDR<5% 



True 




Ramaswamy 



Bismar 



ia^ain(S / ''»/V//,piOf« 



NFKB1 
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in other prostate cancer gene signatures, we extended a method 
of evaluating the similarity of genes based on their shared 
annotation in GO. 11 We computed an ITS score, which 
evaluates the average similarity of a set of genes. Using this 
algorithm, we systematically evaluated the ITS between each 
pair of signatures including the 42 -gene interactor signature and 
the 13 original prostate cancer gene signatures. Next, to correct 
for gene signature length and to calculate an empiric p value, 
we generated 10 000 bootstraps of a similar length gene 
signature derived from the protein network and then examined 
the rank of each gene signature among the bootstraps. The 
interactor signature ranked first in 10 000 bootstraps 
(p<0.0001) of a similar number of genes as demonstrated by 
table 2. The only other signature that was statistically 
significant was Bibikova's high-Gleason signature, which 
resulted in a significant but lower p value of 0.017. 

The SPAN-generated interactor signature has prognostic 
significance in newly diagnosed prostate cancer 

To evaluate the clinical relevance of the interactor signature, 
we performed our evaluation in a completely independent 
dataset, the Swedish Watchful Waiting Cohort. 37 In this study, 
281 men underwent a course of watchful waiting after diag- 
nosis of prostate cancer. We asked whether interactor signature 
overexpression was able to distinguish a group with poorer 
survival. Of the genes in the interactor signature, 35 were 
available for analysis. We divided the patients into two groups 
on the basis of whether their mean gene expression was higher 
than the average of the entire cohort. Kaplan— Meier survival 
analysis of the two groups from the date of their diagnosis was 
performed. The log rank test gave a p value that approached 
significance at 0.052. Importantly, given the heterogeneity 
of prostate cancer, we were able to detect an even greater 



significance (p=0.009) when we only evaluated a subset of 198 
patients with high-grade prostate cancer (Gleason 7—10) 
(figure 3). 

SPAN analysis of the interactor signature emphasizes pathways 
of prostate cancer progression 

Our first SPAN analysis generated a set of highly connected 
genes (interactor signature) related to prostate cancer. A second 
SPAN analysis over the interactor signature allows us to 
prioritize molecular pathways vis-a-vis their protein interactions 
with one another. In other words, the first SPAN allowed us to 
identify disparate expression signatures interacting with 
common cancer proteins of the gold standard Sanger cancer 
genes (Methods). For the 42 key protein interactors thus 
generated, we then further annotated the most central ones in 
the network. The determination of centrality was performed via 
a second SPAN analysis over the interactor signature proteins 
that resulted in their prioritization (node size) and clarification 
of their interactions as shown in figure 4. To highlight estab- 
lished pathways, we overlaid canonical pathway information 
from the KEGG 32 after calculating which of the pathways were 
represented at a statistically significant level (p<0.05). The 
result, when graphed, is what we call a phenotype— pathway 
map (figure 4). In this prostate cancer phenotype— pathway map 
of poor prognosis, seven of the original prostate cancer gene 
signatures form coherent subgroups that are consistent with 
established pathways. 

Our second SPAN and resulting prostate phenotype— pathway 
map allows us to better understand the biological meaning of 
the interactor signature. By looking for dominant molecular 
mechanisms and highly connected genes, we can begin to 
untangle, and conjecture about, the key pathways of poor- 
prognosis prostate cancer. 
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Table 2 Significance of pathway similarity among sets of gene signatures 
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The phenotype— pathway map recapitulates 
phosphatidylinositol-3 kinase (PI3K)/I\IF-kB centrality to prostate 
cancer progression and highlights driver pathways 

Examining figure 4 in detail, we can see that the PI3K/NF-K B 
signaling cascade is central to this phenotype— pathway map, as 
it is a common end point for the various upstream signaling 
cascades. The role of the PI3K/NF-KB in prostate cancer 
progression is well established and is believed to be a mechanism 
for cross-talk with the androgen receptor and thus is implicated 
in androgen independence. 38 This finding has been noted in 
prostate cancer by multiple observers. 39 As in the following 
qualitative discussion, we can recapitulate and prioritize major 
drivers of poor-prognosis prostate cancer, as well as describe 
under-reported findings. As shown below in our review of the 
literature, the Janus kinase 2 (JAK2) and STAT1 stories were 
perhaps the most novel and under-reported. 




100 150 
Months 

Figure 3 Kaplan— Meier analysis of the 42-gene interactor signature 
revealed a clinically significant signal. Genes from the interactor 
signature that were available for analysis (35 genes total) from an active 
surveillance study of prostate cancer were used for analysis. 198 
patients with high-grade (Gleason 7—10) disease were used, and overall 
survival from time of diagnosis was determined. The log rank test 
showed a significant survival difference in patients who had higher 
average expression levels of the genes of interest versus those who had 
lower average expression (p=0.009; Kaplan— Meier analysis). Asterix 
(*) indicates lower expression of interactor signature. 



Feeding into the PIK3/NF-KB pathway are driver pathways 
that include 1 the PDGF signaling cascade, 2 FGF signaling, 3 
interferon (IFN)y signaling, and 4 the JAK/STAT pathway. When 
we consider the KEGG annotations of the pathways, we observe 
that the pathway 'regulation of actin cytoskeleton' (hsa:04010) 
encompasses FGF and PDGF signaling through the PI3K/NF-KB 
cascade. A second KEGG pathway, JAK/STAT (hsa: 04630), 
captures IFN signaling. The importance of the JAK/STAT 
pathways is consistent with conclusions in a separate paper 
analyzing molecular profiling of prostate cancer stem cells. 17 

Key regulators of cell cycle derangements constitute 
a substantial portion of the phenotype— pathway map 

Consistent with the established role of PI3K/NF-KB in mitogenic 
activation, downstream proteins were nearly all associated with 
the cell cycle (hsa: 04110) (figure 4B). Cell cycle kinases, regu- 
latory proteins and proliferating cell nuclear antigen, a known 
marker of proliferation, 40 constitute the majority of the identi- 
fied proteins. Cyclin D3 (CCND3) and its ligand, the tumor 
suppressor protein retinoblastoma 1 (RBI), were prioritized as 
part of cell cycle regulation. RUNX1 — normally associated with 
acute myeloid leukemia (AML) — was tightly associated with 
this sub-network as well. Previous work has demonstrated that 
RUNXl cooperates with E-twenty six transcription to activate 
transcription in the setting of androgen deprivation. 41 

JAK2 is uniquely positioned in the phenotype— pathway map as 
an activator of the PI3K/NF-KB cascade 

Perhaps most interesting from a translational medicine 
perspective is the utility of the phenotype— pathway map in 
helping identify key genetic lynchpins. JAK2 is involved in 
cytokine receptor signaling and has been experimentally 
confirmed in prostate cancer. 12 Examination of figure 4A reveals 
that JAK2 is connected either directly or indirectly to nearly all 
the proteins that are upstream of the PI3K/NF-KB signaling 
cascade. We note the interplay of JAK2 with the FGF, PDGF and 
IFN pathways. Indeed, phosphorylation of the Stat3 oncogene 
via the FGF pathway is dependent on JAK2. 42 The Stat3 onco- 
gene in turn is believed to be downstream of PDGF and also 
activated via JAK2. 43 PDGF activation can then proceed through 
the PI3K/NF-KB pathway 44 to activate proliferation. Similarly, 
the proinflammatory cytokine, IFNy, is traditionally thought to 
bind to the IFNy receptor (partly encoded by IFNGR1) and then 
act via the JAK2/STAT1 pathway in a tumor suppressor role in 
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Figure 4 Prostate phenotype— pathway map. A second single protein analysis of network (SPAN) was conducted over the network presented in 
figure 2 to prioritize a subset of the 42-gene interactor prostate signature of poor prognosis which revealed a tightly interwoven network (top panel; 
Methods). Proteins with an empiric false discovery rate (FDR) <0.05 were retained and are indicated by the larger size shape. Significant KEGG 
pathways (FDR <0.05) were overlaid on to the network and colorized as indicated. Detail A and Detail B expand areas in the top panel that were 
simplified. Square shapes denote prognostic expression signatures with dotted lines to their associated gene; hexagons represent several proteins 
that are closely associated with one another and combined for purposes of simplicity of representation; triangles denote Sanger genes as compared 
with circle shapes which denote the protein products of signature genes. 
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prostate cancer. In fact, STAT1 activation may be a marker of 
derangement in the sensitivity in IFN signaling and has been 
associated with chemoresistance in the castrate setting. 46 In 
other work, Gu et al have published a series of papers charac- 
terizing the behavior of STAT3 and STAT5a/b transcription 
factors using in vivo and in vitro models, and recently posited 
whether JAK2 is the 'common denominator' for their dual 
activation in clinical prostate cancers. 12 Thus the prostate 
phenotype— pathway map highlights the centrality of JAK2 as 
a mediator of the prioritized pathways. Whether JAK2 inhibi- 
tion alone is sufficient to prevent the recurrence of prostate 
cancer or is a bona fide therapeutic target for advanced disease 
remains to be determined in clinical trials. 

Bioinformatics can provide an 'executive synopsis' of relevant 
molecular pathways and points to potential drug targets 

This is the second study that confirms mechanistic overlap in 
relatively disjointed but prognostically congruent gene signatures, 
a paradox noted by Joan Massague in 2007, 8 that was solved with 
our previous publication. 9 We initially demonstrated that cancer 
genes, both oncogenes and tumor suppressors, were interacting 
with signature genes more than expected by empirical distribu- 
tion in our network modeling of protein interactions. Although 
our previous study was conducted in the breast domain, this 
study corroborates our previous findings in the prostate arena but 
differs in two ways. First, we ensure the biological significance of 
our findings by introducing an information theory-based metric. 
Second, we provide a completely unbiased patient cohort to 
confirm the clinical relevance of our study. 

In this paper, we demonstrate, through established bioinfor- 
matics methods, that we can transform heterogeneous and 
complex prostate cancer gene signatures of poor prognosis into 
a clinically meaningful 'executive synopsis' of the most relevant 
pathways and critical genes such as JAK2. There are undoubt- 
edly many such genes. Table 3 provides a summary of the 
members of the phenotype— pathway and their exploration as 



Table 3 Phenotype— pathway map genes and their stage of clinical 
drug development within prostate cancer (source: http://ClinicalTrials. 
gov and PubMed data as of December 2010) 
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potential therapies. A comparison list of drugs and drug targets 
derived from KEGG can be found in online supplemental table 3. 
As shown, these in silico results mirror relevant prostate cancer 
pathways that have been found and verified experimentally in 
vitro/in vivo. 

We also learned that the best discriminatory genes for an 
expression signature do not necessarily make the best input into 
the SPAN network. The 17-gene Nakagawa signature, which 
was developed using a non-parametric supervised learning 
method, ultimately did not connect to the network. In contrast, 
a 132-gene signature derived in an unbiased manner from the 
same dataset connected to nearly all the other signatures via 
SPAN. Thus genes that have the greatest degree of discrimina- 
tory ability may indeed be 'passenger' genes rather than 'driver' 
genes. For purposes of SPAN analysis, we believe it is better to 
pursue the most unbiased gene signature to identify a larger 
grouping of statistically relevant genes and allow the protein 
network to perform the filtering. In other words, a larger set of 
genes is more likely to comprise the 'drivers' of cancer mecha- 
nisms from which SPAN can assert protein interactions, rather 
than the 'passenger' genes that simply correlate with outcome 
and thus cannot contribute mechanisms in network models. 

LIMITATIONS 

By design, we used a simple, single protein network interaction 
model to calculate p values, as it is easier to interpret significant 
results by clinicians and biologist. However, more sensitive and 
powerful network modeling is likely to yield additional insight, 
such as diffusion kernels. 59 Furthermore, we conducted the 
analysis using STRING version 8.0, which contains a limited 
number of interactions and thus limited the study to this subset 
of proteins. While the computational controls show that the 
observed network signature is highly statistically significant, the 
qualitative evaluation of the results rely on previously published 
data. We are therefore beholden to the different methodologies 
and the multitude of oligonucleotide arrays used to derive the 
gene signatures. We have attempted to overcome this by care- 
fully incorporating multiple independent gene signatures and 
using stringent statistical cut-offs to ensure a conservative 
evaluation. Indeed, each seed signature SPAN analysis required 
an FDR <0.05 and more than one interactor; this suggests that 
the relevant interactors of this network signature (spanning 
multiple seed signatures) more likely have a significance of FDR 
<<0.05. Additionally, the Kaplan— Meier analysis was 
performed in a completely independent dataset of all the 
signatures and therefore provided an unbiased evaluation. 

The protein-interaction network and the Sanger cancer genes 
are not static but vibrant growing entities. As we learn more 
about factors contributing to prostate cancer, undoubtedly there 
will be additions and variations to the phenotype— pathway 
map. Nevertheless, the intent of this study was to provide and 
explore a tool for understanding gene expression signatures 
quickly at this moment in time. Going forward, we do intend to 
rerun these analyses with updated and expanded lists of gene 
signatures. Furthermore, as we have shown that different 
interactors prioritized in distinct seed signatures may be related 
to the same oncogene by interactions, this fact suggests that 
novel methods should be developed to produce expression clas- 
sifiers where the interaction is investigated ab initio rather than 
a posteriori. Such an approach could be designed to identify, 
across samples, significant, yet distinct, interactors to an onco- 
gene, thus promoting within the principles of personal genomics 
a fundamental paradigm shift from the current cohort-wide 
requirements. 
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In summary, the phenotype— pathway map provides an 
excellent starting point for developing rational clinical trial 
designs, as it can inform researchers about what therapy should 
be attempted first that may be helpful for the largest number of 
patients. To this end, we are working on translating our priori- 
tized pathway findings into the clinical setting; simultaneously, 
we are extending this technique to other tumor types. As more 
knowledge accumulates about oncogenes and gene signatures, 
reanalysis by this technique may reveal new pathways and 
interconnections that were heretofore unknown or understudied. 

CONCLUSION 

By analyzing multiple prostate cancer signatures of poor prog- 
nosis, we have uncovered seven highly connected cancer genes 
that were not among our original gene signatures of poor 
prognosis. This further confirms our hypothesis that, while 
multiple genes in a high-throughput analysis may change along 
with the activity of oncogenes or tumor suppressors, the critical 
information contained in direct physical interactions among 
proteins is not accessible via expression arrays. As a result, 
a multiscale approach incorporating gene expression data and 
protein interaction networks can elucidate otherwise neglected 
targets and underlying molecular sub-networks underpinning 
the phenotypic concordance of genetically disparate gene 
signatures. At the gene expression signature level, the pathways 
are not apparent. However, the interactor signature not only 
prioritizes biological mechanisms underpinning multiple signa- 
tures, it also recapitulates in good part known pathways 
involved in prostate cancer oncogenesis. Indeed, the phenoty- 
pe— pathway map generated by our interactor signature truly 
recapitulates and underscores the centrality of the PI3K/NF-KB 
pathway and other known mechanisms for prostate cancer 
progression. Moreover, through a systems biology approach, we 
are able to prioritize less well-established pathways, such as 
JAK2, that may ultimately serve as attractive drug targets. From 
seed signatures generated at the cohort level, we have demon- 
strated a posteriori that expression changes in direct, yet 
distinct, interactors to oncogenes correlate with prognosis. Thus 
we propose that ab initio design of mechanistically anchored 
gene expression classifiers are more likely than current cohort- 
level classifier approaches to be sensitive to individual variation 
in personal genomics. 
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